Overview

Dataset statistics

Number of variables31
Number of observations85485
Missing cells886814
Missing cells (%)33.5%
Duplicate rows292
Duplicate rows (%)0.3%
Total size in memory60.6 MiB
Average record size in memory743.3 B

Variable types

CAT12
NUM9
UNSUPPORTED6
BOOL3
DATE1

Reproduction

Analysis started2020-04-23 03:29:13.765294
Analysis finished2020-04-23 03:30:10.219374
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 292 (0.3%) duplicate rows Duplicates
MUNIC_RES has a high cardinality: 467 distinct values High cardinality
NASC has a high cardinality: 24456 distinct values High cardinality
DIAG_PRINC has a high cardinality: 649 distinct values High cardinality
DIAG_SECUN has a high cardinality: 512 distinct values High cardinality
MUNIC_MOV has a high cardinality: 82 distinct values High cardinality
DIAGSEC1 has a high cardinality: 374 distinct values High cardinality
HOSP has a high cardinality: 109 distinct values High cardinality
BAIRRO_RES has a high cardinality: 386 distinct values High cardinality
VAL_SH is highly correlated with UTI_MES_TO and 1 other fieldsHigh Correlation
UTI_MES_TO is highly correlated with VAL_SHHigh Correlation
VAL_SP is highly correlated with VAL_SHHigh Correlation
DIAG_SECUN has 49174 (57.5%) missing values Missing
DIAGSEC1 has 81899 (95.8%) missing values Missing
DIAGSEC2 has 85384 (99.9%) missing values Missing
DIAGSEC3 has 85468 (> 99.9%) missing values Missing
DIAGSEC4 has 85485 (100.0%) missing values Missing
DIAGSEC5 has 85485 (100.0%) missing values Missing
DIAGSEC6 has 85485 (100.0%) missing values Missing
DIAGSEC7 has 85485 (100.0%) missing values Missing
DIAGSEC8 has 85485 (100.0%) missing values Missing
DIAGSEC9 has 85485 (100.0%) missing values Missing
HOSP has 44442 (52.0%) missing values Missing
BAIRRO_RES has 27537 (32.2%) missing values Missing
UTI_INT_TO is highly skewed (γ1 = 68.98860338) Skewed
NACIONAL is highly skewed (γ1 = 73.98392707) Skewed
NASC only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
DIAGSEC4 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC5 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC6 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC7 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC8 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC9 is an unsupported type, check if it needs cleaning or further analysis Rejected
UTI_MES_TO has 72190 (84.4%) zeros Zeros
UTI_INT_TO has 85425 (99.9%) zeros Zeros
DIAS_PERM has 1337 (1.6%) zeros Zeros

Variables

Distinct count2192
Unique (%)2.6%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
Minimum2011-01-01 00:00:00
Maximum2016-12-31 00:00:00
Histogram

CEP
Real number (ℝ≥0)

Distinct count13728
Unique (%)16.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42053148.68476341
Minimum1001000
Maximum96880970
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum1001000
5-th percentile40220310
Q140490084
median41290370
Q342820000
95-th percentile46880000
Maximum96880970
Range95879970
Interquartile range (IQR)2329916

Descriptive statistics

Standard deviation2290503.501
Coefficient of variation (CV)0.05446687282
Kurtosis53.00897561
Mean42053148.68
Median Absolute Deviation (MAD)1630689.389
Skewness1.332215592
Sum3.594913415e+12
Variance5.246406288e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1001000. 7596190. 39805495. 40005495. 40010005. ... 56070000. 56450000. 75813507.5 96692091. 96880970. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
43700000 3691 4.3%
 
42700000 2353 2.8%
 
40050410 1915 2.2%
 
44470000 1809 2.1%
 
44460000 1774 2.1%
 
41250000 1758 2.1%
 
42802580 1049 1.2%
 
42850000 989 1.2%
 
40415000 951 1.1%
 
48280000 721 0.8%
 
Other values (13718) 68475 80.1%
 
ValueCountFrequency (%) 
1001000 1 < 0.1%
 
1310935 1 < 0.1%
 
1509970 1 < 0.1%
 
2325529 1 < 0.1%
 
3310000 2 < 0.1%
 
ValueCountFrequency (%) 
96880970 1 < 0.1%
 
96880000 2 < 0.1%
 
96504182 1 < 0.1%
 
94170090 1 < 0.1%
 
85823750 1 < 0.1%
 

MUNIC_RES
Categorical

HIGH CARDINALITY
Distinct count467
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
Salvador
58476
Simões Filho
 
3735
Camaçari
 
3091
Lauro de Freitas
 
2431
Vera Cruz
 
1831
Other values (462)
15921
ValueCountFrequency (%) 
Salvador 58476 68.4%
 
Simões Filho 3735 4.4%
 
Camaçari 3091 3.6%
 
Lauro de Freitas 2431 2.8%
 
Vera Cruz 1831 2.1%
 
Itaparica 1780 2.1%
 
Candeias 1345 1.6%
 
Dias d'Ávila 1023 1.2%
 
Mata de São João 735 0.9%
 
São Sebastião do Passé 620 0.7%
 
Other values (457) 10418 12.2%
 

Length

Max length22
Mean length8.510896649
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 23 50.0%
 
Uppercase_Letter 11 23.9%
 
Decimal_Number 10 21.7%
 
Space_Separator 1 2.2%
 
Other_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 34 73.9%
 
Common 12 26.1%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

NASC
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count24456
Unique (%)28.6%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
1920-10-12
 
97
1980-02-02
 
90
1993-04-16
 
63
1977-07-23
 
44
2011-06-13
 
43
Other values (24451)
85148
ValueCountFrequency (%) 
1920-10-12 97 0.1%
 
1980-02-02 90 0.1%
 
1993-04-16 63 0.1%
 
1977-07-23 44 0.1%
 
2011-06-13 43 0.1%
 
2010-11-10 40 < 0.1%
 
2011-03-10 38 < 0.1%
 
2011-12-07 37 < 0.1%
 
2013-06-01 36 < 0.1%
 
2011-11-18 35 < 0.1%
 
Other values (24446) 84962 99.4%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

SEXO
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
1
46028
3
39457
ValueCountFrequency (%) 
1 46028 53.8%
 
3 39457 46.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

UTI_MES_TO
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count74
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3997660408258759
Minimum0
Maximum96
Zeros72190
Zeros (%)84.4%
Memory size668.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile9
Maximum96
Range96
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.872623457
Coefficient of variation (CV)3.481027054
Kurtosis39.28939324
Mean1.399766041
Median Absolute Deviation (MAD)2.378455923
Skewness5.383488951
Sum119659
Variance23.74245935
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 31.5 38.5 45.5 62.5 96. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 72190 84.4%
 
1 1531 1.8%
 
2 1530 1.8%
 
3 1277 1.5%
 
7 1071 1.3%
 
4 1055 1.2%
 
5 861 1.0%
 
6 782 0.9%
 
8 555 0.6%
 
9 436 0.5%
 
Other values (64) 4197 4.9%
 
ValueCountFrequency (%) 
0 72190 84.4%
 
1 1531 1.8%
 
2 1530 1.8%
 
3 1277 1.5%
 
4 1055 1.2%
 
ValueCountFrequency (%) 
96 1 < 0.1%
 
92 2 < 0.1%
 
90 1 < 0.1%
 
83 1 < 0.1%
 
75 1 < 0.1%
 

UTI_INT_TO
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count23
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.006422179329706966
Minimum0
Maximum36
Zeros85425
Zeros (%)99.9%
Memory size668.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum36
Range36
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3297564945
Coefficient of variation (CV)51.34650989
Kurtosis5532.626123
Mean0.00642217933
Median Absolute Deviation (MAD)0.01283534349
Skewness68.98860338
Sum549
Variance0.1087393457
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 4.5 16.5 36. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 85425 99.9%
 
2 9 < 0.1%
 
1 6 < 0.1%
 
3 5 < 0.1%
 
4 5 < 0.1%
 
10 5 < 0.1%
 
6 4 < 0.1%
 
5 3 < 0.1%
 
16 3 < 0.1%
 
11 2 < 0.1%
 
Other values (13) 18 < 0.1%
 
ValueCountFrequency (%) 
0 85425 99.9%
 
1 6 < 0.1%
 
2 9 < 0.1%
 
3 5 < 0.1%
 
4 5 < 0.1%
 
ValueCountFrequency (%) 
36 1 < 0.1%
 
31 1 < 0.1%
 
30 1 < 0.1%
 
26 1 < 0.1%
 
25 2 < 0.1%
 

VAL_SH
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count28411
Unique (%)33.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.01740796631
Minimum19.03
Maximum54514.88
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum19.03
5-th percentile161.31
Q1469.48
median545.41
Q3880.78
95-th percentile5408.978
Maximum54514.88
Range54495.85
Interquartile range (IQR)411.3

Descriptive statistics

Standard deviation2514.242602
Coefficient of variation (CV)1.934006873
Kurtosis48.90349118
Mean1300.017408
Median Absolute Deviation (MAD)1266.011974
Skewness5.679082725
Sum111131988.1
Variance6321415.863
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.9030000e+01 2.4750000e+01 3.0970000e+01 3.2405000e+01 3.3840000e+01 ... 1.6104130e+04 2.3234875e+04 2.9394420e+04 4.2177415e+04 5.4514880e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
504.07 7430 8.7%
 
512.07 1691 2.0%
 
528.07 1599 1.9%
 
520.07 1527 1.8%
 
453.48 1516 1.8%
 
451.47 1109 1.3%
 
536.07 1105 1.3%
 
241.31 828 1.0%
 
756.1 730 0.9%
 
160.62 650 0.8%
 
Other values (28401) 67300 78.7%
 
ValueCountFrequency (%) 
19.03 2 < 0.1%
 
30.47 379 0.4%
 
31.47 25 < 0.1%
 
33.34 332 0.4%
 
34.34 7 < 0.1%
 
ValueCountFrequency (%) 
54514.88 1 < 0.1%
 
51199.15 1 < 0.1%
 
49784.99 1 < 0.1%
 
49127.87 1 < 0.1%
 
48692.82 1 < 0.1%
 

VAL_SP
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count6368
Unique (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean219.98002550155
Minimum5.1
Maximum9685.19
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum5.1
5-th percentile25.71
Q159.37
median78.35
Q3183.91
95-th percentile959.75
Maximum9685.19
Range9680.09
Interquartile range (IQR)124.54

Descriptive statistics

Standard deviation413.4403622
Coefficient of variation (CV)1.879445014
Kurtosis44.85623017
Mean219.9800255
Median Absolute Deviation (MAD)225.0856727
Skewness5.232711053
Sum18804992.48
Variance170932.9331
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[5.100000e+00 5.585000e+00 6.405000e+00 7.685000e+00 9.030000e+00 ... 3.196975e+03 4.162175e+03 4.813895e+03 6.826480e+03 9.685190e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
78.35 26692 31.2%
 
25.71 4588 5.4%
 
26.51 4131 4.8%
 
29.4 2754 3.2%
 
183.91 2730 3.2%
 
117.52 2460 2.9%
 
24.1 967 1.1%
 
44.1 866 1.0%
 
404.28 834 1.0%
 
11.62 789 0.9%
 
Other values (6358) 38674 45.2%
 
ValueCountFrequency (%) 
5.1 2 < 0.1%
 
5.58 4 < 0.1%
 
5.59 187 0.2%
 
7.22 1 < 0.1%
 
8.15 41 < 0.1%
 
ValueCountFrequency (%) 
9685.19 1 < 0.1%
 
8572.16 1 < 0.1%
 
8485.2 1 < 0.1%
 
8435.06 1 < 0.1%
 
8240.89 1 < 0.1%
 

DIAG_PRINC
Categorical

HIGH CARDINALITY
Distinct count649
Unique (%)0.8%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
J189
14171
J159
10564
J960
8455
J180
 
7911
J188
 
4357
Other values (644)
40027
ValueCountFrequency (%) 
J189 14171 16.6%
 
J159 10564 12.4%
 
J960 8455 9.9%
 
J180 7911 9.3%
 
J188 4357 5.1%
 
J459 3573 4.2%
 
J219 3060 3.6%
 
J353 2483 2.9%
 
J158 2335 2.7%
 
J90 1725 2.0%
 
Other values (639) 26851 31.4%
 

Length

Max length4
Mean length3.930163187
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 21 67.7%
 
Decimal_Number 10 32.3%
 
ValueCountFrequency (%) 
Latin 21 67.7%
 
Common 10 32.3%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

DIAG_SECUN
Categorical

HIGH CARDINALITY
MISSING
Distinct count512
Unique (%)1.4%
Missing49174
Missing (%)57.5%
Memory size668.0 KiB
0
28274
Y099
 
1570
R060
 
1117
Y86
 
432
J189
 
411
Other values (507)
 
4507
ValueCountFrequency (%) 
0 28274 33.1%
 
Y099 1570 1.8%
 
R060 1117 1.3%
 
Y86 432 0.5%
 
J189 411 0.5%
 
J90 340 0.4%
 
J960 339 0.4%
 
J159 233 0.3%
 
J459 226 0.3%
 
J450 188 0.2%
 
Other values (502) 3181 3.7%
 
(Missing) 49174 57.5%
 

Length

Max length4
Mean length2.418248816
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 24 66.7%
 
Decimal_Number 10 27.8%
 
Lowercase_Letter 2 5.6%
 
ValueCountFrequency (%) 
Latin 26 72.2%
 
Common 10 27.8%
 
ValueCountFrequency (%) 
ASCII 36 100.0%
 

MUNIC_MOV
Categorical

HIGH CARDINALITY
Distinct count82
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
Salvador
70398
Itaparica
 
3476
Simões Filho
 
3062
Camaçari
 
2343
Lauro de Freitas
 
1855
Other values (77)
 
4351
ValueCountFrequency (%) 
Salvador 70398 82.4%
 
Itaparica 3476 4.1%
 
Simões Filho 3062 3.6%
 
Camaçari 2343 2.7%
 
Lauro de Freitas 1855 2.2%
 
Candeias 1421 1.7%
 
Dias d'Ávila 693 0.8%
 
Madre de Deus 630 0.7%
 
Mata de São João 612 0.7%
 
São Sebastião do Passé 440 0.5%
 
Other values (72) 555 0.6%
 

Length

Max length22
Mean length8.543229806
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 23 50.0%
 
Uppercase_Letter 11 23.9%
 
Decimal_Number 10 21.7%
 
Space_Separator 1 2.2%
 
Other_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 34 73.9%
 
Common 12 26.1%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

DIAS_PERM
Real number (ℝ≥0)

ZEROS
Distinct count122
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.861285605661813
Minimum0
Maximum309
Zeros1337
Zeros (%)1.6%
Memory size668.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q310
95-th percentile31
Maximum309
Range309
Interquartile range (IQR)7

Descriptive statistics

Standard deviation11.23743827
Coefficient of variation (CV)1.268149879
Kurtosis25.71812948
Mean8.861285606
Median Absolute Deviation (MAD)7.258957972
Skewness3.534329312
Sum757507
Variance126.2800188
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 97.5 99.5 108. 131. 309. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 13448 15.7%
 
3 10644 12.5%
 
4 8787 10.3%
 
5 6487 7.6%
 
1 6224 7.3%
 
6 5320 6.2%
 
7 4689 5.5%
 
8 3620 4.2%
 
9 2429 2.8%
 
10 2156 2.5%
 
Other values (112) 21681 25.4%
 
ValueCountFrequency (%) 
0 1337 1.6%
 
1 6224 7.3%
 
2 13448 15.7%
 
3 10644 12.5%
 
4 8787 10.3%
 
ValueCountFrequency (%) 
309 1 < 0.1%
 
304 1 < 0.1%
 
186 1 < 0.1%
 
153 1 < 0.1%
 
133 1 < 0.1%
 

MORTE
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
0
77552
1
 
7933
ValueCountFrequency (%) 
0 77552 90.7%
 
1 7933 9.3%
 

NACIONAL
Real number (ℝ≥0)

SKEWED
Distinct count22
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.07590805404457
Minimum10
Maximum333
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum10
5-th percentile10
Q110
median10
Q310
95-th percentile10
Maximum333
Range323
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.95472289
Coefficient of variation (CV)0.2932463133
Kurtosis7162.703482
Mean10.07590805
Median Absolute Deviation (MAD)0.1516118752
Skewness73.98392707
Sum861339
Variance8.730387355
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10. 15. 34.5 39.5 44.5 47.5 60.5 109.5 333. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 85370 99.9%
 
45 51 0.1%
 
71 13 < 0.1%
 
81 8 < 0.1%
 
35 6 < 0.1%
 
39 6 < 0.1%
 
103 4 < 0.1%
 
333 4 < 0.1%
 
37 4 < 0.1%
 
20 3 < 0.1%
 
Other values (12) 16 < 0.1%
 
ValueCountFrequency (%) 
10 85370 99.9%
 
20 3 < 0.1%
 
21 1 < 0.1%
 
32 1 < 0.1%
 
34 1 < 0.1%
 
ValueCountFrequency (%) 
333 4 < 0.1%
 
190 1 < 0.1%
 
175 1 < 0.1%
 
110 2 < 0.1%
 
109 3 < 0.1%
 

INSTRU
Boolean

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
0
85485
ValueCountFrequency (%) 
0 85485 100.0%
 

INSC_PN
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
0
85484
2
 
1
ValueCountFrequency (%) 
0 85484 > 99.9%
 
2 1 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

CNES
Real number (ℝ≥0)

Distinct count131
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1878182.22040124
Minimum3786
Maximum7223676
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum3786
5-th percentile3816
Q14065
median4294
Q32802104
95-th percentile6595197
Maximum7223676
Range7219890
Interquartile range (IQR)2798039

Descriptive statistics

Standard deviation2318129.465
Coefficient of variation (CV)1.23424098
Kurtosis-0.1244778272
Mean1878182.22
Median Absolute Deviation (MAD)1948636.584
Skewness1.01324121
Sum1.605564071e+11
Variance5.373724215e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.7860000e+03 3.7900000e+03 3.8010000e+03 3.8120000e+03 3.8240000e+03 ... 6.8815105e+06 7.1740220e+06 7.2052555e+06 7.2233155e+06 7.2236760e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2802104 11905 13.9%
 
4065 9887 11.6%
 
6595197 9507 11.1%
 
4278 5850 6.8%
 
3980 4940 5.8%
 
3859 4642 5.4%
 
3816 4427 5.2%
 
2602083 3476 4.1%
 
2532387 3062 3.6%
 
3832 2992 3.5%
 
Other values (121) 24797 29.0%
 
ValueCountFrequency (%) 
3786 859 1.0%
 
3794 38 < 0.1%
 
3808 1038 1.2%
 
3816 4427 5.2%
 
3832 2992 3.5%
 
ValueCountFrequency (%) 
7223676 1 < 0.1%
 
7222955 497 0.6%
 
7187556 105 0.1%
 
7160488 1621 1.9%
 
6602533 27 < 0.1%
 

RACA_COR
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.78035912733228
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size668.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median99
Q399
95-th percentile99
Maximum99
Range98
Interquartile range (IQR)96

Descriptive statistics

Standard deviation42.86226885
Coefficient of variation (CV)0.588926317
Kurtosis-0.9529050325
Mean72.78035913
Median Absolute Deviation (MAD)38.15918536
Skewness-1.023149651
Sum6221629
Variance1837.174091
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 2.5 3.5 4.5 52. 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
99 62206 72.8%
 
3 17586 20.6%
 
2 3761 4.4%
 
1 1592 1.9%
 
4 337 0.4%
 
5 3 < 0.1%
 
ValueCountFrequency (%) 
1 1592 1.9%
 
2 3761 4.4%
 
3 17586 20.6%
 
4 337 0.4%
 
5 3 < 0.1%
 
ValueCountFrequency (%) 
99 62206 72.8%
 
5 3 < 0.1%
 
4 337 0.4%
 
3 17586 20.6%
 
2 3761 4.4%
 

ETNIA
Boolean

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size668.0 KiB
0
85485
ValueCountFrequency (%) 
0 85485 100.0%
 

DIAGSEC1
Categorical

HIGH CARDINALITY
MISSING
Distinct count374
Unique (%)10.4%
Missing81899
Missing (%)95.8%
Memory size668.0 KiB
Y86
1029
J960
623
A419
 
173
J342
 
130
J159
 
128
Other values (369)
1503
ValueCountFrequency (%) 
Y86 1029 1.2%
 
J960 623 0.7%
 
A419 173 0.2%
 
J342 130 0.2%
 
J159 128 0.1%
 
R579 125 0.1%
 
J969 95 0.1%
 
J189 81 0.1%
 
J353 53 0.1%
 
J158 48 0.1%
 
Other values (364) 1101 1.3%
 
(Missing) 81899 95.8%
 

Length

Max length4
Mean length3.028285664
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 24 66.7%
 
Decimal_Number 10 27.8%
 
Lowercase_Letter 2 5.6%
 
ValueCountFrequency (%) 
Latin 26 72.2%
 
Common 10 27.8%
 
ValueCountFrequency (%) 
ASCII 36 100.0%
 

DIAGSEC2
Categorical

MISSING
Distinct count48
Unique (%)47.5%
Missing85384
Missing (%)99.9%
Memory size668.0 KiB
J960
23
J159
 
9
A419
 
5
J189
 
4
J969
 
4
Other values (43)
56
ValueCountFrequency (%) 
J960 23 < 0.1%
 
J159 9 < 0.1%
 
A419 5 < 0.1%
 
J189 4 < 0.1%
 
J969 4 < 0.1%
 
R570 3 < 0.1%
 
J069 3 < 0.1%
 
B24 3 < 0.1%
 
J180 2 < 0.1%
 
N179 2 < 0.1%
 
Other values (38) 43 0.1%
 
(Missing) 85384 99.9%
 

Length

Max length4
Mean length3.001006024
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 13 52.0%
 
Decimal_Number 10 40.0%
 
Lowercase_Letter 2 8.0%
 
ValueCountFrequency (%) 
Latin 15 60.0%
 
Common 10 40.0%
 
ValueCountFrequency (%) 
ASCII 25 100.0%
 

DIAGSEC3
Categorical

MISSING
Distinct count13
Unique (%)76.5%
Missing85468
Missing (%)> 99.9%
Memory size668.0 KiB
A419
3
J960
3
A415
 
1
I269
 
1
R578
 
1
Other values (8)
8
ValueCountFrequency (%) 
A419 3 < 0.1%
 
J960 3 < 0.1%
 
A415 1 < 0.1%
 
I269 1 < 0.1%
 
R578 1 < 0.1%
 
J158 1 < 0.1%
 
K740 1 < 0.1%
 
I619 1 < 0.1%
 
R579 1 < 0.1%
 
C349 1 < 0.1%
 
Other values (3) 3 < 0.1%
 
(Missing) 85468 > 99.9%
 

Length

Max length4
Mean length3.000187167
Min length3
ValueCountFrequency (%) 
Decimal_Number 10 55.6%
 
Uppercase_Letter 6 33.3%
 
Lowercase_Letter 2 11.1%
 
ValueCountFrequency (%) 
Common 10 55.6%
 
Latin 8 44.4%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

DIAGSEC4
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

DIAGSEC5
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

DIAGSEC6
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

DIAGSEC7
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

DIAGSEC8
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

DIAGSEC9
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing85485
Missing (%)100.0%
Memory size668.0 KiB

HOSP
Categorical

HIGH CARDINALITY
MISSING
Distinct count109
Unique (%)0.3%
Missing44442
Missing (%)52.0%
Memory size668.0 KiB
HOSPITAL SANTO ANTONIO
11905
HOSPITAL DO SUBURBIO
9507
HOSPITAL GERAL DE ITAPARICA
3476
HOSPITAL MUNICIPAL DE SIMOES FILHO
3062
HOSPITAL GERAL DE CAMACARI
 
2343
Other values (104)
10750
ValueCountFrequency (%) 
HOSPITAL SANTO ANTONIO 11905 13.9%
 
HOSPITAL DO SUBURBIO 9507 11.1%
 
HOSPITAL GERAL DE ITAPARICA 3476 4.1%
 
HOSPITAL MUNICIPAL DE SIMOES FILHO 3062 3.6%
 
HOSPITAL GERAL DE CAMACARI 2343 2.7%
 
HOSPITAL GERAL MENANDRO DE FARIA 1855 2.2%
 
HOSPITAL ALAYDE COSTA 1621 1.9%
 
HOSPITAL MUNICIPAL DE CANDEIAS 973 1.1%
 
HOSPITAL 2 DE JULHO 904 1.1%
 
ORTOFORT 769 0.9%
 
Other values (99) 4628 5.4%
 
(Missing) 44442 52.0%
 

Length

Max length58
Mean length13.58092063
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 24 85.7%
 
Lowercase_Letter 2 7.1%
 
Space_Separator 1 3.6%
 
Decimal_Number 1 3.6%
 
ValueCountFrequency (%) 
Latin 26 92.9%
 
Common 2 7.1%
 
ValueCountFrequency (%) 
ASCII 28 100.0%
 

BAIRRO_RES
Categorical

HIGH CARDINALITY
MISSING
Distinct count386
Unique (%)0.7%
Missing27537
Missing (%)32.2%
Memory size668.0 KiB
São Marcos
 
3228
Nazaré
 
2154
Periperi
 
2047
Pernambués
 
2003
Centro
 
1602
Other values (381)
46914
ValueCountFrequency (%) 
São Marcos 3228 3.8%
 
Nazaré 2154 2.5%
 
Periperi 2047 2.4%
 
Pernambués 2003 2.3%
 
Centro 1602 1.9%
 
Paripe 1308 1.5%
 
Fazenda Grande do Retiro 1278 1.5%
 
Bonfim 1110 1.3%
 
Liberdade 1034 1.2%
 
Fazenda Coutos 1023 1.2%
 
Other values (376) 41161 48.1%
 
(Missing) 27537 32.2%
 

Length

Max length46
Mean length8.250067263
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 35 49.3%
 
Uppercase_Letter 25 35.2%
 
Decimal_Number 5 7.0%
 
Space_Separator 1 1.4%
 
Open_Punctuation 1 1.4%
 
Dash_Punctuation 1 1.4%
 
Other_Letter 1 1.4%
 
Other_Punctuation 1 1.4%
 
Close_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 61 85.9%
 
Common 10 14.1%
 
ValueCountFrequency (%) 
ASCII 58 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

DT_INTERCEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9HOSPBAIRRO_RES
02012-12-0140415000Salvador1980-02-021002112.03173.29J420Salvador3101000280210430.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIOBonfim
12011-12-2841940560Salvador1920-10-123002112.03173.29J8490Salvador3101000280210420.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIORio Vermelho
22011-12-2841940560Salvador1920-10-123002112.03173.29J8490Salvador3101000280210420.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIORio Vermelho
32012-12-0140415000Salvador1980-02-021002112.03173.29J420Salvador3101000280210430.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIOBonfim
42011-12-2841940560Salvador1920-10-123002112.03173.29J8490Salvador3101000280210420.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIORio Vermelho
52012-12-0140415000Salvador1980-02-021002112.03173.29J420Salvador3101000280210430.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIOBonfim
62011-12-2841940560Salvador1920-10-123002083.84167.70J8490Salvador3001000280210420.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIORio Vermelho
72012-12-0140415000Salvador1980-02-021002043.90167.70J420Salvador3001000280210430.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIOBonfim
82011-12-2841940560Salvador1920-10-123002043.90167.70J8490Salvador3001000280210420.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIORio Vermelho
92012-12-0140415000Salvador1980-02-021002043.90167.70J420Salvador3001000280210430.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL SANTO ANTONIOBonfim

Last rows

DT_INTERCEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9HOSPBAIRRO_RES
854752011-01-2544470000Vera Cruz1989-01-10300504.0778.35J180NaNItaparica3010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854762011-01-1044460000Itaparica2006-04-09100504.0778.35J180NaNItaparica3010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854772011-01-1044470000Vera Cruz2006-01-15100504.0778.35J180NaNItaparica3010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854782011-01-1040255205Salvador2008-06-16300504.0778.35J180NaNItaparica3010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICAMatatu
854792011-01-1044470000Vera Cruz2002-12-21300504.0778.35J180NaNItaparica3010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854802011-01-0544460000Itaparica1999-05-29100453.4825.71J459NaNItaparica2010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854812011-01-0944470000Vera Cruz1987-11-13100504.0778.35J180NaNItaparica5010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854822011-01-1144460000Itaparica1956-01-07300504.0778.35J180NaNItaparica4010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854832011-01-1544460000Itaparica1923-04-25100504.0778.35J158NaNItaparica10010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN
854842011-01-1344470000Vera Cruz1961-04-28100504.0778.35J180NaNItaparica7010002602083990.0NaNNaNNaNNaNNaNNaNNaNNaNNaNHOSPITAL GERAL DE ITAPARICANaN